Add Prometheus /metrics Support by vraiti · Pull Request #3362 · vllm-project/vllm-omni

vraiti · 2026-05-05T18:37:52Z

Purpose

Addresses #3228

Add Prometheus metrics for Omni multi-stage pipelines. Pipeline-level metrics (vllm:omni_*) track request counts, e2e latency, queue time, and per-stage diffusion timing breakdowns. Upstream vLLM per-engine metrics (vllm:*) are preserved for AR stages via PrometheusStatLogger.

To distinguish metrics generated by distinct engines, each metric has four labels:

model_name: The name of the model the engine is running
engine: A unique integer identifying that engine
stage_id: The index of the pipeline stage this engine is serving (can be non-unique within a replicated stage)
replica_id: A base-0 index unique to each replica within a stage

Workarounds

stage_id and replica_id actually can't be supported right now for LLM stages since the upstream PrometheusStatLogger does not accept custom metric labels. This is addressed in vllm-project/vllm#42743. Once that PR is merged, only the following will be required to provide LLM metrics with stage_id and replica_id: vraiti@fa7d3a9

Upstream unregister_vllm_metrics() (called inside PrometheusStatLogger.__init__) strips any collector whose name contains "vllm". There is currently a PR in vLLM to make this more precise: vllm-project/vllm#42331, but until then a new patch has been added to make unregister_vllm_metrics a no-op since the function was just there for CI purposes (it does not affect production use).

This PR also includes a make_stats() throttle (implemented in OmniSchedulerMixin) that limits SchedulerStats collection to once per second, eliminating a performance regression in LLM stages due to serialization overhead

Metrics added

Metric	Type
`vllm_omni:num_requests_running`	Gauge
`vllm_omni:num_requests_waiting`	Gauge
`vllm_omni:num_requests_success`	Counter
`vllm_omni:num_requests_fail`	Counter
`vllm_omni:e2e_request_latency_seconds`	Histogram
`vllm_omni:request_queue_time_seconds`	Histogram
`vllm_omni:diffusion_preprocess_time_ms`	Histogram
`vllm_omni:diffusion_exec_time_ms`	Histogram
`vllm_omni:diffusion_postprocess_time_ms`	Histogram
`vllm_omni:diffusion_step_time_ms`	Histogram

Test Plan

1. vLLM regression (Qwen3-Omni, 4x A100-40GB)

vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct \
  --omni --deploy-config vllm_omni/deploy/qwen3_omni_tp2.yaml --port 8000

Run text-only chat completions: 1 warmup request, 3 benchmark requests. Verify vllm:* metrics are present on /metrics.

2. Diffusion regression (Z-Image-Turbo, 1x A100-40GB)

vllm serve Tongyi-MAI/Z-Image-Turbo --omni --port 8000

Run 512x512 image generation at 8 diffusion steps: 1 warmup request, 5 benchmark requests. Verify v_llm_omni:* metrics are present and diffusion timing histograms are populated.

3. vLLM metrics (Qwen3-Omni)

Scrape /metrics from the Qwen3-Omni deployment and verify all expected vllm:* metric families are present (scheduler state, cache usage, token throughput, latencies).

4. Diffusion metrics (Z-Image-Turbo)

Scrape /metrics from the Z-Image-Turbo deployment and verify v_llm_omni:* pipeline metrics are present, diffusion timing histograms have observations, and no vllm:* metrics appear.

5. `engine_id` and `stage_id` metadata

Deploy diffusion and LLM pipelines with multiple replicas and observe correct

Test Result

1. vLLM regression (Qwen3-Omni)

	Baseline	feat_prometheus
Mean latency	3.08s	3.07s
Text tok/s	83.4	83.1

No regression observed

2. Diffusion regression (Z-Image-Turbo)

	Baseline	feat_prometheus
Mean latency	1.31s	1.31s

No regression observed.

3. vLLM metrics (Qwen3-Omni)

37 vllm:* metric families present. Full output: metrics_prometheus_qwen3.txt

4. Diffusion metrics (Z-Image-Turbo)

10 v_llm_omni:* metric families present. Diffusion timing histograms populated. No vllm:* metrics (expected — no AR engine). Full output: metrics_prometheus_z-image.txt

5. `engine_id` and `stage_id` metadata

Diffusion (Z-Image-Turbo):
metrics_with_replicas_diffusion.txt

LLM (

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-05-05T18:37:57Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-05-05T21:16:09Z

Good addition of Prometheus /metrics support for observability. The new metrics for diffusion (preprocess, exec, postprocess, total time) will help with performance monitoring. Suggestions: 1) Consider adding metrics for request queue length/wait time. 2) Document which endpoints expose /metrics and how to configure Prometheus scraping. 3) Add tests to verify metrics are recorded correctly.

hsliuustc0106 · 2026-05-06T12:29:39Z

cc @bjf-frz @wuhang2014 PTAL

vraiti · 2026-05-06T15:40:40Z

Good addition of Prometheus /metrics support for observability. The new metrics for diffusion (preprocess, exec, postprocess, total time) will help with performance monitoring. Suggestions: 1) Consider adding metrics for request queue length/wait time. 2) Document which endpoints expose /metrics and how to configure Prometheus scraping.

v_llm_omni:num_requests_waiting is a queue depth gauge and v_llm_omni:request_queue_time_seconds is a queue time histogram.
/metrics is mostly exposed unconditionally when serving online, but docs/usage/metrics.md documents when each metric is exposed (just dependent on whether the pipeline includes the relevant stage engine type).

Is there something not covered here you'd like to see?

Add tests to verify metrics are recorded correctly.

I will add unit tests

wuhang2014

Thanks for the metrics work here — the overall direction looks good, but I found two blocking issues and one doc regression:

OmniPrometheusMetrics registers collectors on every instance creation, which is likely to break when multiple Omni/OmniBase instances are created in the same Python process due to duplicate Prometheus timeseries registration.
request_queue_time_seconds is defined, documented, and unit-tested synthetically, but the production path never passes queue_seconds into request_succeeded(), so the metric will never accumulate real observations.
docs/design/index.md appears to have dropped links to two existing feature design docs.

I left inline comments with details and suggested fixes.

wuhang2014 · 2026-05-06T23:13:37Z

+vLLM-Omni runs multiple engine instances (stages) within a single
+process, coordinated by an Orchestrator. The pipeline needs its own
+metrics — aggregate request counts, end-to-end latency across all
+stages, and diffusion timing breakdowns — that do not exist in upstream
+vLLM. These metrics must survive the `unregister_vllm_metrics()`
+cleanup pass. So, all pipeline-level metrics use the prefix `v_llm_omni:`.
+
+Upstream per-engine metrics retain the `vllm:` prefix and are
+registered by a `PrometheusStatLogger` instance that the Orchestrator
+creates and feeds directly.


I don’t think v_llm_omni: is the right long-term design here. The orchestrator already uses the correct pattern for upstream metrics: one PrometheusStatLogger with per-engine labels, not one logger per stage. The problem is the separate OmniPrometheusMetrics path in OmniBase, which registers fresh collectors per instance.

That separate registration path is what forces the custom prefix, because upstream unregister_vllm_metrics() removes any collector whose name contains "vllm". So a simple rename to vllm:* would still be fragile.

I’d suggest folding these omni metrics into the single orchestrator-owned/global Prometheus logger instead of registering a second collector set in OmniBase. Then they can live in the same namespace with names like vllm:omni_*, while stage/model separation stays in labels. That would avoid both duplicate registration and the prefix workaround.

It's not exactly one logger per stage. It's one logger for vLLM stages (PrometheusStatLogger) and one logger for diffuser stages/the pipeline (OmniPrometheusMetrics)

The separate OmniPrometheusMetrics class inside OmniBase is necessary because some vLLM metrics require per-iteration updates, while e2e pipeline statistics require the OmniBase/entrypoint context. For example:

num_requests_waiting: Requires reading the length of the OmniBase.request_states

num_requests_fail: Requires tracking in OmniBase since malformed HTTP requests can fail before they reach the orchestrator
Meanwhile, any metrics that depend upon vLLM's IterationStats logically should be collected at the orchestrator level, because that's where the per-iteration control occurs.

Additionally, I chose to keep the vLLM-Omni metrics separate from the vLLM metrics because PrometheusStatLogger is imported directly from vLLM. The reason vLLM metrics work seamlessly in this PR is that vLLM metrics infrastructure is (mostly) preserved, and trying to merge it with the vLLM-Omni-specific logic is likely to break that support or at least make it more brittle.

Folding OmniPrometheusMetrics into this class (i.e. subclassing PrometheusStatLogger in OmniPrometheusMetrics) would still require at least the following:

Collecting all generated vLLM IterationStats into a list within the orchestrator and passing this up to OmniBase

Moving PrometheusStatLogger.record() to OmniBase, where you would have to call it once per collected IterationStats collected

Creating a new method to record pipeline/diffusion statistics since PrometheusStatLogger.record() takes the vLLM-specific dataclasses IterationStats and SchedulerStats

Thanks, this makes sense.

I still don’t think v_llm_omni: is the right design here. It feels like a workaround for unregister_vllm_metrics() being too broad, rather than the right metric namespace.

I’d rather keep the metric names aligned with the vLLM namespace and add a small hack/fix in unregister_vllm_metrics() so it avoids unnecessary unregister, instead of encoding that workaround into a new long-term prefix.

Ok,I've submitted a vLLM PR to fix: vllm-project/vllm#42331 . This won't get merged until at least the next minor version, so I've added a patch that changes unregister_vllm_metrics to a no-op temporarily. Either way, vLLM-Omni metrics now use the prefix vllm:omni_

amy-why-3459 · 2026-05-07T03:26:10Z

Thank you very much for your contribution. Would you be interested in supporting performance data statistics for different stages in the benchmark? #1361

hsliuustc0106 · 2026-05-27T01:31:22Z

resolve conflicts bplease

vraiti · 2026-05-27T14:51:48Z

resolve conflicts bplease

@hsliuustc0106 done

hsliuustc0106

Reviewed the full diff (17 files, +871/-13). Solid PR overall — the three issues from wuhang2014's prior review are all resolved. Two new issues found that need attention, plus some smaller informational items.

Resolved from prior review

Duplicate collector registration → fixed (module-level singletons)
queue_seconds never populated → fixed (omni_base.py:990-992 now reads queue_wait_ms from pipeline_timings)
docs/design/index.md navigation regression → was a false positive (links were preserved)

What's good

Module-level collector singletons, replica_id/stage_id labels, thorough unit tests (CPU-runnable), well-structured design doc, documented monkey-patch with removal conditions

hsliuustc0106 · 2026-05-28T00:58:36Z

                                            vllm_config=vllm_config,
                                            executor_class=executor_class,
-                                            log_stats=False,
+                                            log_stats=True,


INFORMATIONAL: log_stats=True is now set unconditionally for all LLM stages (was False). The throttle in OmniSchedulerMixin.make_stats() protects omni schedulers, but if any pipeline uses a non-omni scheduler (no mixin), make_stats() runs unthrottled on every step — reintroducing the ZMQ serialization overhead this PR was designed to avoid.

Fix: Either confirm all LLM schedulers inherit OmniSchedulerMixin, or gate log_stats per stage type based on whether the scheduler has the throttle.

Both LLM schedulers (OmniARScheduler and OmniGenerationScheduler) both inherit from OmniSchedulerMixin, so this should be OK for now

hsliuustc0106 · 2026-05-28T00:58:36Z

+                    logger.debug(
                        "\n%s",
                        _format_table(
                            f"RequestE2EStats [request_id={rid}]",


INFORMATIONAL: logger.info → logger.debug for RequestE2EStats, StageRequestStats, TransferEdgeStats, and OmniTiming tables (also at lines 580, 625, 665). Users relying on these per-request timing logs at INFO level will lose visibility after this change. This is a separate behavioral change from the Prometheus metrics work — consider reverting to logger.info or documenting that OmniPrometheusMetrics is the intended replacement channel.

The downgrade to logger.debug is necessary because now with schedulers defaulting to log_stats=True, these logs are no longer gated behind --log-stats. If they remain at INFO level, they will by default appear in the server logs.

After this PR is merged, I am planning to submit a PR to add OTEL traces, which combined with Prometheus metrics should largely obviate metrics in the server logs.

Signed-off-by: vraiti <vraiti@redhat.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Signed-off-by: vraiti <vraiti@redhat.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

…theusStatLogger as VllmPrometheusStatLogger Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

…ed vllm:* series Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Signed-off-by: vraiti <vraiti@redhat.com>

Introduce vllm_omni/metrics/definitions.py as the single source of truth for metric family names, label sets, histogram buckets, and RTF formulas. Server-side prometheus.py and bench-side MultiModalsBenchmark Metrics now consume the same constants, eliminating the dual-track naming drift between /metrics output and bench CLI reports. Pre-work for the upcoming RFC extensions: per-modality audio/image/ video families (G1/G2), cross-stage transfer family (G3), and the OmniPrometheusStatLogger wrap that relabels engine into stage+replica (G7). All refactors are name-only; PR vllm-project#3362's existing 10 prometheus tests pass unchanged. Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>

Phase 4 of multi-modal observability (RFC G3). Four Histogram families with {model_name, from_stage, from_replica, to_stage, to_replica} labels, emitted from the existing TransferEdgeStats accumulators in OrchestratorAggregator: transfer_size_bytes -- BYTES_BUCKETS, observed at TX hook transfer_tx_time_ms -- MS_BUCKETS, observed at TX hook transfer_rx_decode_time_ms -- MS_BUCKETS, observed at RX hook transfer_in_flight_time_ms -- MS_BUCKETS, observed at RX hook Each .observe_*() corresponds to one physical transfer event (one chunk hop) so the histogram tracks per-transfer distribution rather than request-aggregated sums. Two non-trivial design choices, both deviating from RFC §3.2.6: 1. Add model_name to the label set (RFC lists only the four stage/replica labels). Rationale: aligns transfer with the rest of the omni_* families (audio_*, image_*, video_*, num_requests_*) so PromQL joins on model_name work uniformly. Cardinality cost is zero for typical single-model deployments. To be amended back into the RFC as D11. 2. Resolve from_replica / to_replica by consulting the sticky-routing binding the orchestrator already maintains, rather than plumbing replica ids through TransferEdgeStats / StageRequestStats / connector adapters. Phase 2 (PR vllm-project#2396) gave us stage_pool.get_bound_replica_id(request_id), which returns the replica each request is bound to within a stage; we look it up at emit time. Five files would need touching otherwise (TransferEdgeStats, StageRequestStats, stage_pool.build_stage_metrics, adapter.on_forward call site, request bookkeeping). To be amended back into the RFC as D12. Five edits: 1. definitions.py — TRANSFER_LABELS prepends model_name. 2. transfer.py — new OmniTransferMetrics(model_name) class with the four Histogram families and one observe method per family. Same wrapper pattern as OmniPrometheusMetrics (PR vllm-project#3362) and OmniModalityMetrics (Phase 3): bind model_name at __init__, accept stage/replica at observe time. 3. stats.py — OrchestratorAggregator gains two optional kwargs at __init__ (transfer_emitter, replica_resolver). record_transfer_tx and record_transfer_rx call thin _emit_transfer_tx / _emit_transfer_rx helpers after the existing accumulation, which fail-safe (skip emit) when either dep is missing or the resolver returns None for either side. The TransferEdgeStats accumulation path is unchanged so existing log/aggregation consumers stay intact. 4. omni_base.py — instantiate self.transfer_metrics alongside self.prom_metrics / self.mod_metrics. 5. async_omni.py — wire transfer_emitter and replica_resolver into the per-request OrchestratorMetrics construction. New _resolve_transfer_replica helper looks up self.engine.stage_pools[s].get_bound_replica_id(rid). 15 unit tests cover the 4-family registration, observe APIs, bucket selection, multi-edge cardinality, plus the OrchestratorAggregator emit hooks (tx + rx full emit, defensive skips for emitter=None / resolver=None / one-side-resolves-to-None, rx skips stage 0). Stub emitter records call signatures so the routing logic is asserted without standing up a Prometheus registry. Note: TX-side hook (record_transfer_tx) is emit-ready but adapter.try_send_via_connector — its only caller — currently has no upstream invocation in the main code path (likely plugin-driven or about-to-land). RX side (record_transfer_rx) is the active path today; TX side will start emitting as soon as the connector pipeline activates. Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>

…al{finished_reason} Phase 5 of multi-modal observability (RFC G6). Collapse the two PR-vllm-project#3362-era pipeline-level counters vllm:omni_num_requests_success vllm:omni_num_requests_fail into a single per-reason Counter: vllm:omni_requests_success_total{model_name, finished_reason} with finished_reason ∈ {stop, length, abort, ...} mirroring upstream vllm:request_success_total. The previous "fail" path now maps to finished_reason="abort" so the Pipeline-level success-rate dashboard becomes one query (sum by (finished_reason) ...) instead of needing two metrics joined. Breaking change: dashboards / alerts that reference the old num_requests_success / num_requests_fail counter names must migrate to the new requests_success_total{finished_reason} family. Four edits: 1. definitions.py — drop NUM_REQUESTS_SUCCESS, NUM_REQUESTS_FAIL. Add REQUESTS_SUCCESS (passed without _total since Counter auto-suffixes at exposition). 2. prometheus.py — replace _success_family + _fail_family with a single _completion_family using SUCCESS_LABELS = ("model_name", "finished_reason"). The label is bound per-call rather than at __init__ time. - OmniPrometheusMetrics.request_succeeded() gains a finished_reason="stop" default kwarg. - OmniPrometheusMetrics.request_failed() keeps its no-arg signature for back-compat with the cleanup call site, internally mapping to finished_reason="abort". 3. omni_base.py — at the request-finalize hook, extract finish_reason from engine_outputs.outputs[0].finish_reason (vLLM CompletionOutput convention) and pass it through to request_succeeded(). Falls back to "stop" when no completion output is present (defensive — e.g. diffusion stages). 4. tests/metrics/test_prometheus.py — refresh _PIPELINE_METRICS, exercise three reason buckets (stop x 2 + length x 1 + abort x 1) in the scrape fixture, replace the old success/fail-count assertions with per-reason ones, and adjust the histogram-count assertions (e2e/queue go from 2 to 3 since the abort path now only inc's the counter and doesn't observe latency). Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>

PR vllm-project#3362 introduced docs/usage/metrics.md and docs/design/metrics.md covering its own pipeline-level + diffusion families. The follow-up work in this branch (G1 audio, G2 image/video, G3 cross-stage transfer, G6 success/fail merge into requests_success_total{finished_reason}, G7 OmniPrometheusStatLogger per-replica wrap) wasn't reflected. Refresh both docs to the final state. usage/metrics.md (+117 / -38): - Request Tracking table swaps num_requests_success / num_requests_fail for the merged requests_success_total{finished_reason} family. - New sections for Modality (audio / image / video) and Cross-Stage Transfer with the per-modality / per-edge metric tables. - vLLM Engine Metrics section gains a before/after example showing how the G7 wrap reshapes engine -> stage + replica, with a note that the ~37 upstream families gain per-replica visibility automatically. - Diffusion Engine Metrics section clarifies that omni-side diffusion families bypass the wrap (engine label = stage_id, not relabelled). - Pipeline Type availability matrix gains modality / transfer / per-replica rows. - Naming Convention section explains the RFC "co-position, different name" three-modality TTFP convention. design/metrics.md (+260 / -74): - Objectives section calls out the per-replica + modality + transfer goals introduced after PR vllm-project#3362. - Architecture diagram replaces the original two-component picture with the four-component one (OmniPrometheusMetrics + OmniModalityMetrics + OmniTransferMetrics + OmniPrometheusStatLogger). - Data Flow grows from two paths to four: Pipeline-level, Modality (finalize hook + audio_ttfp streaming hook), Cross-stage transfer (sticky-routing replica_resolver), and the wrapped per-engine path. The transfer subsection notes the TX-side hook is wired but currently inactive pending try_send_via_connector being called from the main code path. - New "OmniPrometheusStatLogger Wrap (G7)" section explains the four upstream impedance points and the three coordinated mechanisms (class-level metric class slots, per_engine_labelvalues property descriptor, _RelabelMixin.labels() override) used to handle them. Also documents the stage_replica_map construction and the deferred dynamic add/remove decision. - Metric Definitions table refreshed end-to-end: pipeline-level row now lists requests_success_total{finished_reason}; new modality and transfer subtables added; transfer subtable carries a note that model_name was added on top of the RFC-listed four stage/replica labels for cross-omni consistency. - Logging vs. Prometheus section expanded to mention OmniModality / OmniTransfer alongside OmniPrometheusMetrics, and notes that OrchestratorAggregator.record_transfer_tx/rx fans out to both the log accumulator and the Prometheus emit hook in one method body. Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>

vraiti · 2026-06-01T18:23:17Z

Merged with 41a795b

vraiti requested a review from hsliuustc0106 as a code owner May 5, 2026 18:37

vraiti force-pushed the feat_prometheus branch 4 times, most recently from 8eaeb34 to ca092e5 Compare May 5, 2026 18:49

hsliuustc0106 requested a review from gcanlin May 6, 2026 12:29

vraiti force-pushed the feat_prometheus branch 3 times, most recently from 6fc5de6 to c0e86e9 Compare May 6, 2026 20:37

wuhang2014 reviewed May 6, 2026

View reviewed changes

Comment thread vllm_omni/metrics/prometheus.py

wuhang2014 reviewed May 6, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/omni_base.py Outdated

wuhang2014 reviewed May 6, 2026

View reviewed changes

Comment thread docs/design/index.md

wuhang2014 suggested changes May 6, 2026

View reviewed changes

wuhang2014 reviewed May 6, 2026

View reviewed changes

vraiti force-pushed the feat_prometheus branch from 291cb3a to c99cf65 Compare May 7, 2026 14:57

vraiti requested a review from wuhang2014 May 8, 2026 13:00

vraiti mentioned this pull request May 11, 2026

[Metrics] Scope unregister_vllm_metrics() to PrometheusStatLogger-owned metrics vllm-project/vllm#42331

Open

4 tasks

vraiti force-pushed the feat_prometheus branch from c99cf65 to f21781b Compare May 11, 2026 17:07

vraiti requested review from Gaohan123, david6666666, lishunyang12, tzhouam, yenuo26 and ywang96 as code owners May 11, 2026 17:07

LHXuuu mentioned this pull request May 12, 2026

[RFC]: Extend Prometheus with multimodal SLOs, cross-stage transfer, and per-replica labels #3545

Open

1 task

vraiti force-pushed the feat_prometheus branch 2 times, most recently from 973609e to edfadb5 Compare May 18, 2026 18:12

vraiti force-pushed the feat_prometheus branch 2 times, most recently from 684ff52 to 4798d8b Compare May 26, 2026 20:34

vraiti force-pushed the feat_prometheus branch 3 times, most recently from d858a76 to a9796d0 Compare May 27, 2026 14:50

vraiti force-pushed the feat_prometheus branch from 65076dc to dd3673f Compare May 27, 2026 19:19

hsliuustc0106 added the high priority high priority issue, needs to be done asap label May 28, 2026

hsliuustc0106 reviewed May 28, 2026

View reviewed changes

vraiti and others added 9 commits May 28, 2026 10:28

Add Prometheus metrics for multi-stage pipelines

d5a9286

Signed-off-by: vraiti <vraiti@redhat.com>

Add E2E test for Prometheus metrics under multi-replica config

00f4d8e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Fix dict-style .get() on OutputMessage dataclass attributes

9a1ef63

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Fix per-engine stats dropping finished requests due to shared timer

1dd335b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Downgrade per-request metric logs to DEBUG

8cb1cd8

Signed-off-by: vraiti <vraiti@redhat.com>

Delegate make_stats() to upstream Scheduler via super()

5d13fbc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Rename OmniPrometheusMetrics to OmniPrometheusStatLogger, alias Prome…

280c127

…theusStatLogger as VllmPrometheusStatLogger Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Exclude diffusion stages from engine_indexes to avoid stale zero-valu…

3d1beed

…ed vllm:* series Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Fix typo

40ec127

Signed-off-by: vraiti <vraiti@redhat.com>

vraiti force-pushed the feat_prometheus branch from dd3673f to 40ec127 Compare May 28, 2026 14:52

vraiti requested a review from hsliuustc0106 May 28, 2026 17:14

vraiti mentioned this pull request May 29, 2026

[RFC]: Add OTel Tracing #3995

Open

1 task

vraiti closed this Jun 1, 2026

Conversation

vraiti commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Workarounds

Metrics added

Test Plan

1. vLLM regression (Qwen3-Omni, 4x A100-40GB)

2. Diffusion regression (Z-Image-Turbo, 1x A100-40GB)

3. vLLM metrics (Qwen3-Omni)

4. Diffusion metrics (Z-Image-Turbo)

5. engine_id and stage_id metadata

Test Result

1. vLLM regression (Qwen3-Omni)

2. Diffusion regression (Z-Image-Turbo)

3. vLLM metrics (Qwen3-Omni)

4. Diffusion metrics (Z-Image-Turbo)

5. engine_id and stage_id metadata

Uh oh!

chatgpt-codex-connector Bot commented May 5, 2026

Uh oh!

hsliuustc0106 commented May 5, 2026

Uh oh!

hsliuustc0106 commented May 6, 2026

Uh oh!

vraiti commented May 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuhang2014 left a comment

Choose a reason for hiding this comment

Uh oh!

wuhang2014 May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vraiti May 7, 2026

Choose a reason for hiding this comment

Uh oh!

wuhang2014 May 9, 2026

Choose a reason for hiding this comment

Uh oh!

vraiti May 11, 2026

Choose a reason for hiding this comment

Uh oh!

amy-why-3459 commented May 7, 2026

Uh oh!

hsliuustc0106 commented May 27, 2026

Uh oh!

vraiti commented May 27, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Resolved from prior review

What's good

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 May 28, 2026

Choose a reason for hiding this comment

Uh oh!

vraiti May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 May 28, 2026

Choose a reason for hiding this comment

Uh oh!

vraiti May 28, 2026

Choose a reason for hiding this comment

Uh oh!

vraiti commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

vraiti commented May 5, 2026 •

edited

Loading

5. `engine_id` and `stage_id` metadata

5. `engine_id` and `stage_id` metadata

wuhang2014 May 6, 2026 •

edited

Loading